The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations

نویسندگان

  • Johan Bos
  • Kilian Evang
  • Johannes Bjerva
  • Lasha Abzianidze
  • Hessel Haagsma
  • Rik van Noord
  • Pierre Ludmann
  • Duc-Duy Nguyen
چکیده

The Parallel Meaning Bank is a corpus of translations annotated with shared, formal meaning representations comprising over 11 million words divided over four languages (English, German, Italian, and Dutch). Our approach is based on cross-lingual projection: automatically produced (and manually corrected) semantic annotations for English sentences are mapped onto their word-aligned translations, assuming that the translations are meaning-preserving. The semantic annotation consists of five main steps: (i) segmentation of the text in sentences and lexical items; (ii) syntactic parsing with Combinatory Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and (v) compositional semantic analysis based on Discourse Representation Theory. These steps are performed using statistical models trained in a semisupervised manner. The employed annotation models are all language-neutral. Our first results are promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Scoped Meaning Representations

Semantic parsing offers many opportunities to improve natural language understanding. We present a semantically annotated parallel corpus for English, German, Italian, and Dutch where sentences are aligned with scoped meaning representations in order to capture the semantics of negation, modals, quantification, and presupposition triggers. The semantic formalism is based on Discourse Representa...

متن کامل

Optimality in Analysis, Generation, and Learning: Towards a Robust Computational Architecture for Corpus-based Studies of Syntax

This paper describes a computational architecture for accessing implicit information about the grammar of the languages included in a parallel corpus and exploiting it in an Optimality Theorystyle learning approach. Previous work on OT learning presupposes the existence of training data in which the underlying input has been annotated. This is an idealization that does not reflect the natural l...

متن کامل

Optimality in Analysis, Generation and Learning: towards a Robust Computational Architecture for Corpus-based Studies of Syntax 1

This paper describes a computational architecture for accessing implicit information about the grammar of the languages included in a parallel corpus and exploiting it in an Optimality Theory-style learning approach. Previous work on OT learning presupposes the existence of training data in which the underlying input has been annotated. This is an idealization that does not reflect the natural ...

متن کامل

Multilingual Distributed Representations without Word Alignment

Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic repres...

متن کامل

Distributed representations for compositional semantics

The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches—meaning distributed representations that exploit co-occurrence statistics of large corpora—have proved popular and successful across a number of tasks. H...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017